Missing Value Estimation Based on Dynamic Attribute Selection

نویسندگان

  • K. C. Lee
  • J. S. Park
  • Y. S. Kim
  • Yung-Tai Byun
چکیده

Raw Data used in data mining often contain missing information, which inevitably degrades the quality of the derived knowledge. In this paper, a new method of guessing missing attribute values is suggested. This method selects attributes one by one using attribute group mutual information calculated by flattening the already selected attributes. As each new attribute is added, its missing values are filled up by generating a decision tree, and the previously filled up missing values are naturally utilized. This ordered estimation of missing values is compared with some conventional methods including Lobo's ordered estimation which uses static ranking of attributes. Experimental results show that this method generates good recognition ratios in almost all domains with many missing values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing Value Imputation Method Based on Density Clustering and Grey Relational Analysis

In the computer-aided medical diagnosis, the problem of missing attribute values in many medical data sets brings a great challenge to data mining. To solve the problem, this paper proposes a method based on density clustering and grey relational analysis. It provides an effective solution for missing medical data. The method uses the characteristic and degree of data samples dynamic relation a...

متن کامل

Predicting Missing Attribute Values Using k-Means Clustering

Problem statement: Predicting the value for missing attributes is an important data preprocessing problem in data mining and knowledge discovery tasks. Several methods have been proposed to treat missing data and the one used more frequently is deleting instances containing at least one missing value of a feature. When the dataset has minimum number of missing attribute values then we can negle...

متن کامل

Missing-value estimation using linear and non-linear regression with Bayesian gene selection

MOTIVATION Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value est...

متن کامل

Modified Deviation Approach to Deal with Missing Attribute values in Data Mining with Different percentage of Missing Values

Information System having missing attribute values (in practical) hampers accurate estimation of Data Mining. If missing attribute values can be predicted in the pre-processing stage of data mining then it will help to improve the accuracy, and the existing data mining algorithms can also be applied based on complete data. In this work different type of methods available to handle incomplete in...

متن کامل

Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000